Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

نویسندگان

  • Weizhong Zhang
  • Bin Hong
  • Wei Liu
  • Jieping Ye
  • Deng Cai
  • Xiaofei He
  • Jie Wang
چکیده

Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in both the memory usage and computational cost without sacrificing accuracy. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVM. Experiments on both synthetic and real datasets (e.g., the kddb dataset with about 20 million samples and 30 million features) demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplemental Material: Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

Weizhong Zhang * 1 2 Bin Hong * 1 3 Wei Liu 2 Jieping Ye 3 Deng Cai 1 Xiaofei He 1 Jie Wang 3 State Key Lab of CAD&CG, Zhejiang University, China 2 Tencent AI Lab, Shenzhen, China, 3 University of Michigan, USA In this supplement, we first present the detailed proofs of all the theorems in the main text and then report the rest experiment results which are omitted in the experiment section due ...

متن کامل

Scaling Up Sparse Support Vector Machine by Simultaneous Feature and Sample Reduction

Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVMs remains challengi...

متن کامل

Trading Accuracy for Size: Online Small SVMs via Linear Independence in the Feature Space

Support Vector Machines (SVMs) are a machine learning method rooted in statistical learning theory. One of their most interesting characteristics is that the solution achieved during training is sparse, meaning that a few samples are usually considered “important” by the algorithm (the so-called support vectors) and give account of most of the complexity of the classification/regression task. I...

متن کامل

SVM Classifier Incorporating Feature Selection Using GA for Spam Detection

The use of SVM (Support Vector Machines) in detecting e-mail as spam or nonspam by incorporating feature selection using GA (Genetic Algorithm) is investigated. An GA approach is adopted to select features that are most favorable to SVM classifier, which is named as GA-SVM. Scaling factor is exploited to measure the relevant coefficients of feature to the classification task and is estimated by...

متن کامل

Karhunen-Loeve Transform and Sparse Representation Based Plant Leaf Disease Recognition

To improve the classification accuracy rate of apple leaf disease images and solve the problem of dimension redundancy in feature extraction, Karhunen-Loeve (K-L) transform and sparse representation are applied to apple leaf disease recognition. Firstly 9 color features and 8 texture features of disease leaf images are extracted and taken as feature vectors after dimensionality reduction by the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017